Multiword expressions in spoken language: An exploratory study on pronunciation variation
نویسندگان
چکیده
The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. These N-grams were filtered and subsequently assigned to linguistic categories. For a small selection of these N-grams we examined the phonetic transcriptions contained in the corpus. We found that the pronunciation of these N-grams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analysed the pronunciations of the individual words composing the N-grams in two context conditions: (1) in the N-gram context and (2) in any other context. We found that words in Ngrams do indeed have peculiar pronunciation patterns. This seems to suggest that the N-grams investigated may be considered as MWEs that should be treated as lexical entries in the pronunciation lexicons used in ASR and APT, with their own specific pronunciation variants. 2005 Elsevier Ltd. All rights reserved. 0885-2308/$ see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.csl.2004.11.003 * Corresponding author. Tel.: +31 24 36 12 908; fax: +31 24 36 12 907. E-mail address: [email protected] (D. Binnenpoorte). 434 D. Binnenpoorte et al. / Computer Speech and Language 19 (2005) 433–449
منابع مشابه
Analyzing and identifying multiword expressions in spoken language
The present paper investigates multiword expressions (MWEs) in spo ken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs ...
متن کاملMultiword expressions in spontaneous speech: do we really speak like that?
In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these Ngrams phonetic transcriptions were available, which were examined. Our results show that the pronunciation ...
متن کاملPragmatic expressions in cross-linguistic perspective
This paper focuses on some pragmatic expressions that are characteristic of informal spoken English, their possible equivalents in some other languages, and their use by EFL learners from different backgrounds. These expressions, called general extenders (e.g. and stuff, or something), are shown to be different from discourse markers and to exhibit variation in form, funct...
متن کاملGender in everyday speech and language: a corpus-based study
This paper presents an exploratory study on the relations between gender and everyday parlance. A “data-mining” approach is used to explore gender-specific characteristics in a large number of spontaneous telephone and face-to-face conversations. Our study focuses on speech rate (speaking rate and articulation rate), disfluencies (filled pauses and repetitions), pronunciation variation (phoneme...
متن کاملA Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers
This paper will focus on the study of the variation of co-occurrence patterns encountered in written and spoken registers, through the analysis of a large lexical database of corpus-extracted multiword expressions (MWEs) of European Portuguese. Those MWEs were automatically extracted from a balanced 50 million word written corpus and a 1 million word spoken corpus, furthermore statistically int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 19 شماره
صفحات -
تاریخ انتشار 2005